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Art Unit: 2655 

DETAILED ACTION 
Claim Objections 

1 . Claim 23 objected to because of the following informalities: "claim 23" in line 1, should 
be corrected to "claim 22." 

Appropriate correction is required. 

Claim Rejections - 35 USC §102 

2. The following is a quotation of the appropriate paragraphs of 35 U.S. C. 1 02 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by another filed 
in the United States before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, except that an 
international application filed under the treaty defined in section 35 1 (a) shall have the effects for purposes of this 
subsection of an application filed in the United States only if the international application designated the United 
States and was published under Article 2 1 (2) of such treaty in the English language. 

3. Claims 1-3, 5-7, 10, and 11 are rejected under 35 U.S. C 102(e) as being anticipated by 
Bond et al (U.S. Patent: 6,539, 348). 

With respect to Claim 1 ? Bond discloses: 
Receiving the input string (Col. 3, Lines 9-10); 

Segmenting the input string into one or more proposed tokens (Col. 3, Lines 21-29); 
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Validating the proposed tokens by submitting the proposed tokens to a linguistic 
knowledge component to determine whether the proposed tokens represent linguistically 
meaningful units (Col. 3, Lines 29-48); and 

If not, re-segmenting the input string into one or more different proposed tokens (Col. 5, 
Lines 45-61). 

With respect to Claim 2, Bond recites: 

Accessing segmentation criteria arranged in a predetermined hierarchy of segmentation 
criteria, and segmenting based on the segmentation criteria in an order based on the hierarchy 
(Col 10, Lines 11-24). 

With respect to Claim 3, Bond discloses: 

Accessing language-specific data containing a portion of the segmentation criteria (Col 
7, Lines 19-23). 

With respect to Claim 5, Bond discloses: 

Validating and re-segmenting until all characters in the input string have been validated 
or until the predetermined hierarchy of segmentation criteria has been exhausted (Col. 2, Lines 3- 

a). 

With respect to Claim 6, Bond recites: 

Accessing the lexicon to determine whether it contains the proposed tokens (Col 3, Lines 

29-48). 

With respect to Claim 7, Bond discloses: 

Invoking the morphological analyzer to convert a form of the proposed tokens to a 
morphologically different form (Col 3, Lines 45-50); and 
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Accessing the lexicon to determine whether it contains the morphologically different 
form of the token (Col 3, Lines 50-52). 

Claim 10 contains subject matter similar to Claims 1 and 7, and thus, is rejected for the 
same reasons. 

With respect to Claim 11, Bond recites: 

Repeating the steps of proposing a subsequent segmentation and submitting the 
subsequent segmentation to the linguistic knowledge component until the portion of the input 
string is validated or the portion of the input string has been segmented according to a 
predetermined number of segmentation criteria (Col 10, Lines 11-24). 

Claim Rejections - 35 JJSC §103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

5. Claims 4, 8, 9, 12, and 17-19 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Bond et al in view of Cams (U.S. Patent: 5,890,103). 

With respect to Claim 4, Bond teaches the natural language processing system utilizing a 
dictionary lookup-process in determining a proper sentence segmentation format, as applied to 
Claim 1 . Bond does not specifically suggest language-dependent punctuation data in a 
segmentation process, however, Carus recites: 
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Accessing a precedence hierarchy of punctuation in the language-specific data, the 
precedence hierarchy being arranged based on binding properties of the punctuation in the 
precedence hierarchy, and segmenting the input string based on the punctuation in an order based 
on the precedence hierarchy (segmentation based on punctuation placement for specific 
languages, Col 42, Lines 10-37). 

Bond and Carus are analogous art because they are from a similar field of endeavor in 
natural language processing. Thus, it would have been obvious to a person of ordinary skill in 
the art, at the time of invention, to modify the teachings of Bond with the use of language 
specific segmentation rules based on punctuation placement as taught by Carus to implement 
higher level linguistic processing (Carus, Col 2, Lines 9-16) in order to prevent incorrect 
segmentation by identifying special characters (Carus, Col 39, Lines 23-28) that could have 
different meanings for a specific language based upon character location. 

With respect to Claim 8, Bond teaches the natural language processing system utilizing a 
dictionary lookup-process in determining a proper sentence segmentation format, as applied to 
Claim 1, while Carus teaches the use of language specific segmentation rules based on 
punctuation placement as applied to Claim 4. 

Bond and Carus are obvious in combination for the reasons noted with respect to Claim 

4. 

Claim 9 contains subject matter similar to Claim 5, and thus, is rejected for the same 
reasons. 

With respect to Claim 12, Bond teaches the natural language processing system utilizing 
a dictionary lookup-process in determining a proper sentence segmentation format, as applied to 
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Claim 10. Bond does not specifically suggest segmenting an input string at white spaces, 
however such a segmenting technique is well known and commonly used in the art as a more 
basic means of parsing input text as is evidenced by Cams (Col 13, Lines 62-64). 

Bond and Cams are analogous art because they are from a similar field of endeavor in 
natural language processing. Thus, it would have been obvious to a person of ordinary skill in 
the art, at the time of invention, to modify the teachings of Bond with the means of segmenting 
an input text string at white spaces as taught by Cams to provide basic initial segmentation of an 
input text, based on white spaces, for further linguistic analysis. 

With respect to Claim 17, Cams discloses the means of segmenting an input text string at 
white spaces, as applied to Claim 12. 

With respect to Claim 18, Bond additionally discloses: 

Determining whether the token contains either all alpha characters or all numeric 
characters; and if so, indicating that the token cannot be further segmented and will be treated as 
an unrecognized word (acronym containing all capital letters marked as a unknown word and 
assigned a token, Col. 3, Lines 45-61). 

With respect to Claim 19, Cams further recites: 

Determining whether the token includes final punctuation; and if so, segmenting the 
token into a subtoken by splitting off the final punctuation (Jones \ Col 41, Lines 1-10). 

6. Claims 13 and 14 are rejected under 35 U.S.C. 103(a) as being unpatentable over Bond 
et al in view of Cams, and further in view of Grefenstette (U.S. Patent: 6,289,304). 
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With respect to Claim 13, Bond in view of Carus teaches the natural language processing 
system utilizing a dictionary lookup-process in determining a proper sentence segmentation 
format and a means of segmenting an input text string at white spaces, as applied to Claim 12. 
Bond in view of Carus does not teach the detection and segmentation of emoticons, however, 
Grefenstette discloses: 

Determining whether invalid tokens contain any of a predetermined plurality of multi- 
character punctuation strings or emoticons and, if so, segmenting the tokens into subtokens based 
on the multi-character punctuation strings or emoticons (" smiley s, " Col 4 y Line 62- Col 5, Line 
4)- 

Bond, Carus, and Grefenstette are analogous art because they are from a similar field of 
endeavor in natural language processing. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Bond in view of 
Carus with the means of parsing emoticons in a text string as taught by Grefenstette in order to 
increase natural language processing system capabilities by implementing a means for 
recognizing and segmenting emoticons which would otherwise have no meaning in a traditional 
lexicon. 

With respect to Claim 14, Carus additionally recites: 

Determining whether invalid tokens contain punctuation marks; and if so, segmenting the 
tokens into subtokens according to a predetermined precedence hierarchy of punctuation 
(detecting an apostrophe within a text string and segmenting text based on apostrophe location, 
Col 40, Lines 7-50, and Col 42, Lines 10-37). 



Application/Control Number: 09/822,976 Page 8 

Art Unit: 2655 

7. Claims 15, 16, and 21-23 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Bond et al in view of Cams, further in view of Grefenstette, and in yet further view of Malsheen 
et al (U.S. Patent: 5,634,084). 

With respect to Claim 15, Bond in view of Cams, and in further view of Grefenstette 
teaches the natural language processing system capable of detecting punctuation marks within a 
token, as applied to Claim 14. Bond in view of Cams, and in further view of Grefenstette does 
not teach determining whether a token contains both alpha and numeric characters and 
segmenting a string containing such characters at alpha-numeric boundaries, however Malsheen 
suggests: 

Determining whether invalid tokens contain both alpha and numeric characters; and if so, 
segmenting the tokens into subtokens at boundaries between the alpha and numeric characters in 
the tokens (parsing syllables that can consist of a combination of letters and numbers, Abstract), 

Bond, Cams, Grefenstette, and Malsheen are analogous art because they are from a 
similar field of endeavor in linguistic processing. Thus, it would have been obvious to a person 
of ordinary skill in the art, at the time of invention, to modify the teachings of Bond in view of 
Cams, and in further view of Grefenstette with the ability to determine whether a token contains 
both alpha and numeric characters and segment a string containing such characters as taught by 
Malsheen in order to improve natural language processing system capabilities by parsing alpha- 
numeric words which would not be separated using conventional text processing (Malsheen, Col 
2, Lines 53-59). 

With respect to Claim 16, Bond further recites: 
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Reassembling previously segmented subtokens (producing a revised token, Col 3, Lines 

45-61). 

Claim 21 contains subject matter similar to Claim 13, and thus, is rejected for the same 
reasons. 

With respect to Claim 22, Cams additionally recites: 

Determining whether the token includes one or more edge punctuation marks; and if so, 
segmenting the token into subtokens by splitting off the one or more edge punctuation marks 
according to a predetermined edge punctuation precedence hierarchy ('twas, Col 40, Lines 7- 
23). 

With respect to Claim 23, Cams additionally recites: 

Determining whether the token includes one or more internal punctuation marks, internal 
to the tokens; and if so, segmenting the token into subtokens based on the one or more internal 
punctuation marks according to a predetermined internal punctuation precedence hierarchy 
(male/female token, Col 37, Line 63- Col 38, Line 6). 

8. Claim 20 is rejected under 35 U.S.C. 103(a) as being unpatentable over Bond et al in 
view of Cams, and further in view of Malsheen et al. 

With respect to Claim 20, Bond in view of Cams teaches the natural language processing 
system capable of detecting final punctuation marks within a token, as applied to Claim 19, 
while Malsheen teaches the ability to determine whether a token contains both alpha and numeric 
characters and segment a string containing such characters as applied to Claim 15. 
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Bond, Cams, and Malsheen are obvious in combination for the reasons noted with respect 
to Claim 15. 

Conclusion 

9. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure: 

• Parra (U.S. Patent: 5,870, 700)- teaches a means of additional language specific 
natural language token processing. 

• Williams (U.S. Patent: 5,963, 742)- teaches the use of multiple subparsers that 
each differently interpret tokens. 

• Newsted (U.S. Patent: 6,016,467)- teaches the use of punctuation characters to 
separate tokens. 

10. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to James S. Wozniak whose telephone number is (703) 305-8669 
and email is James.Wozniak@uspto.gov. The examiner can normally be reached on Mondays- 
Fridays, 8:30-4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached at (703) 305-4827. The fax/phone number for the 
Technology Center 2600 where this application is assigned is (703) 872-9306. 
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Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the technology center receptionist whose telephone number is (703) 306- 
0377. 

James S. Wozniak 
10/7/2004 
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